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REMARKS 
I. Status Summary 

Claims 1-23 are pending in the present application. Claims 1 and 1 1 have been 
amended. Therefore, upon entry of this Amendment, Claims 1-23 will be pending. No 
new matter has been introduced by the present amendment. Reconsideration of the 
application as amended and based on the arguments set forth hereinbelow is 
respectfully requested. 

The Examiner indicated that the article titled " Combined Acoustic Echo Control 
and Noise Reduction for Hand-Free Telephony ", S. Gustafsson et al. (Signal 
Processing 64 (1998), was not received. Applicant has included herewith a copy of 
the article. 

II. Specification 

The abstract of the disclosure is objected to because it contains "Fig. 3". 
( Official Action , page 2.) The phrase "Fig. 3" at the last line of the Abstract has been 
deleted. Therefore, applicant respectfully submits that the objection to the abstract of 
the disclosure should be withdrawn. 

III. Claim Rejections Under 35 U.S.C. § 102 
Claims 1, 2, 4, 5, 11 , 12, 15-20, 22, and 23 stand rejected under 35 U.S.C. § 

102(e) as being anticipated by U.S. Patent No. 5,933,495 to Oh (hereinafter, "Oh"). 

This rejection is respectfully traversed. 

Regarding Claim 1 , the Examiner contended that Oh teaches a device at Figure 

2 for subband noise suppression in telephone devices using a subband adaptive filter 
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216. ( Official Action , page 3.) The Examiner stated that Oh also teaches a control 
circuit for adjusting filter coefficients operating in the subband and synthesis filter 234 
transforms the subband reduced-noise signal into a full-band signal at Figures 2 and 3, 
and column 4, lines 9-67 of Oh. ( Official Action , page 3.) 

Upon careful consideration and review of Oh, applicant respectfully submits that 
Oh does not disclose each and every element of the presently claimed subject matter 
and therefore does not anticipate the presently claimed subject matter. Claim 1 recites 
a device for suppressing noise in telephone equipment. Further, Claim 1 recites an 
additional filter with a short propagation time being arranged in the transmission path 
of the telephone equipment. Claim 1 has been amended to recite that the additional 
filter includes adjustable coefficients and a control circuit for adjusting the coefficients. 
The additional filter operates in the full band while the control circuit for adjusting the 
coefficients operates in the subband. Applicant respectfully submits that Oh does not 
disclose these features required by amended Claim 1 . 

According to the Examiner, Figure 2 of Oh teaches adaptive filter 216 including 
a control circuit for adjusting the filter coefficients operating in the subband and 
synthesis filter 234 for transforming the subband reduced-noise signal into a full-band 
signal. ( Official Action , page 3.) Referring to Figure 2 of Oh, adaptive filter 216 
operates in a subband. Adaptive filter 216 operates similarly to adaptive filter 116, 
which is described as operating in the subband. (Oh, column 1, lines 59-63, and 
column 4, lines 22-25.) In addition, Oh teaches that the coefficients of adaptive filter 
116 are adjusted to provide acoustic echo cancellation. ( Oh , column 2, lines 2-4.) The 
coefficients are provided via line 228 to adaptive filter 216 in an acoustic echo 
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canceller block 210. (Oh, Figure 2.) Thus, Oh teaches that filtering occurs in the 
subband. In marked contrast, Claim 1 recites that the additional filter operates in the 
full band. For these reasons, Oh does not teach each and every feature of Claim 1 
and, thus, cannot anticipate the claim. 

Claims 2, 4, and 5 depend from Claim 1. Therefore, claims 2, 4, and 5 include 
the features of Claim 1. Thus, the comments presented below relating to amended 
Claim 1 apply equally to claims 2, 4, and 5. For the same reasons provided for Claim 
1 , it is respectfully submitted that Oh does not anticipate Claims 2, 4, and 5. 

The Examiner stated that Claim 1 1 is similar to Claim 1 and rejected for the 
same reasons. Claim 1 1 has been amended to place the claim in better method claim 
format. Claim 11 recites a method for noise suppression in the telephone equipment. 
Claim 1 1 has been amended to recite a step for filtering the transmitted signal from the 
telephone equipment with a short propagation time. In addition, Claim 11 has been 
amended to recite a step for controlling the filtering of step (a) with adjustable 
coefficients. Further, Claim 11 recites that the filtering is carried out in the full band, 
while the determination of the coefficients is carried out in the subband. Applicant 
respectfully submits that Oh does not disclose these features recited by amended 
Claim 11. 

As previously stated, Oh teaches adaptive filter 216 operating in a subband 
within block 210. Further, Oh teaches that the coefficients of adaptive filter are 
provided via line 228 to adaptive filter 216 in block 210. Oh also teaches that block 
210 operates in the subband, not the fullband. Thus, Oh teaches that filtering occurs 
in the subband. In marked contrast, Claim 11 recites that the filtering is carried out in 
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the full band while the determination of the coefficients is carried out in the subband. 
For these reasons, Oh does not teach each and every feature of Claim 11 and, thus, 
cannot anticipate the claim. 

Claims 12, 15-20, 22, and 23 depend from Claim 11. Therefore, Claims 12-23 
include the features of Claim 11. Thus, the comments presented below relating to 
amended Claim 11 apply equally to claims 12-23. For the same reasons provided for 
Claim 11, it is respectfully submitted that Oh does not anticipate Claims 12-23. 

For the all of the reasons provided above, applicant respectfully requests that 
the rejections of Claims 1, 2, 4, 5, 11-12, 15-20, 22 and 23 under 35 U.S.C. §1 02(e) be 
withdrawn and the claims allowed at this time. 

IV. Claim Rejections Under 35 U.S.C. § 103 
Claims 6-9 and 21 stand rejected under 35 U.S.C. § 103(a) as being 
unpatentable over Oh, as applied to Claims 5 and 20, and further in view of U.S. 
Patent No. 5,757,937 to Itoh et al. (hereinafter, "Itoh"). In addition, Claims 3, 10, 13, 
and 14 stand rejected under 35 U.S.C. § 103(a) as being unpatentable over Oh as 
applied to Claims 1 and 1 1 . These rejections are respectfully traversed. 

As previously stated, Oh fails to teach each and every element recited by Claim 
1. In addition, applicant respectfully submits that Oh fails to suggest each and every 
element recited by Claim 1 . Itoh fails to overcome the significant shortcomings of Itoh 
to disclose or suggest the features of amended Claim 1 . 

Itoh is directed to an acoustic noise suppressor which suppresses signals other 
than speech signals or the like. ( Itoh , column 1, lines 4-8.) In addition, Itoh teaches a 
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noise suppressor including an analysis/discrimination part 20. (Itoh, column 4, lines 
51-54.) Part 20 comprises an LPC analysis part 22, an autocorrelation analysis part 
23, a maximum value detecting part 24 and a speech/non-speech identification part 
25. ( Itoh , column 4, lines 55-58.) Further, Itoh teaches that part 30 includes a 
psychoacoustically weighted substration part 34 for multiplying a noise spectrum Sn(f) 
by a psychoacoustic weighting coefficient W(f) and subtracting the psychoacoustically 
weighted noise spectrum from spectrum S(f) provided from a frequency analysis part 
31. ( Itoh , column 5, lines 8-13.) Nowhere does Itoh disclose or suggest a filter 
operating in a full band while a control circuit for adjusting the coefficient of the filter 
operating in a subband. Therefore, for these reasons, Claim 1 is believed to be 
patentably distinguished over the combination of Oh and Itoh because the references 
do not disclose or suggest the presently claimed subject matter. 

Claims 3 and 6-10 depend from Claim 1. Therefore, Claims 3 and 6-10 include 
the features of Claim 1. Thus, the comments presented below relating to Claim 1 
apply equally to Claims 3 and 6-10. For these reasons, Claims 3 and 6-10 are 
believed to be patentably distinguished over the combination of Oh and Itoh because 
the references do not disclose or suggest the presently claimed subject matter. 

As previously stated, Oh fails to teach each and every element recited by Claim 
11. In addition, applicant respectfully submits that Oh fails to suggest each and every 
element recited by Claim 11. Itoh fails to overcome the significant shortcomings of 
Itoh to disclose or suggest the features of amended Claim 1 1 . 

As previously stated, Itoh is directed to an acoustic noise suppressor which 
suppresses signals other than speech signals or the like. In addition, Itoh teaches a 
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noise suppressor including an analysis/discrimination part 20. Nowhere does Itoh 
disclose or suggest a filtering is carried out in the full band while the determination of 
the coefficients is carried out in the subband. Therefore, for these reasons, Claim 11 
is believed to be patentably distinguished over the combination of Oh and jtoh 
because the references do not disclose or suggest the presently claimed subject 
matter. 

Claims 13, 14, and 21 depend from Claim 11. Therefore, Claims 13, 14, and 21 
include the features of Claim 11. Thus, the comments presented below relating to 
Claim 11 apply equally to Claims 13, 14, and 21. For these reasons, Claims 13, 14, 
and 21 are believed to be patentably distinguished over the combination of Oh and 
Itoh because the references do not disclose or suggest the presently claimed subject 
matter. 

Applicant respectfully submits that the teachings of Oh and Itoh, either alone or 
in combination, do not teach or suggest each and every feature of the present subject 
matter, and therefore that Claims 3, 6-10, 13, 14, and 21 are not obvious in view of the 
Oh and Itoh. Applicant, therefore, respectfully requests that the rejection of Claims 3, 
6-10, 13, 14, and 21 under 35 U.S.C. § 103(a) be withdrawn and the claims allowed at 
this time. 
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CONCLUSION 



In light of the above amendments and remarks, it is respectfully submitted that 
the present application is now in proper condition for allowance, and an early notice to 
such effect is earnestly solicited. 

If any small matter should remain outstanding after the Patent Examiner has 
had an opportunity to review the above Remarks, the Patent Examiner is respectfully 
requested to telephone the undersigned patent attorney in order to resolve these 
matters and avoid the issuance of another Official Action. 



The Commissioner is hereby authorized to charge any fees associated with the 
filing of this correspondence to Deposit Account No. 50-0426 . 
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Abstract 

In ihr: paper we propose an algorithm for combined acoustic echo control and noise reduction. The algorithm is 
oe tJi' : d ■ 5 ;'..e basis of a minimum mean-square error criterion and consists of a (conventional) echo canceller and an 
..J *-eise and residual echo reduction filter. A special feature of the algorithm is the procedure for estimating the 

; ■ i--.. . . " ■ vvver spectral density which relies on the assumption that the phase of the estimated echo is approximately 
cqu.-.J to.: he phase of the true echo. This assumption is verified by experimental results. The residual echo power density 
e^U:itate and the noise power density estimate are then adaptiyely combined and used as an argument tor some spectral " 
weit.'h jr-y. rule such that the residual echo is attenuated and effectively masked by a low level of intentionally left 
background. noise. The paper concludes with experimental results for a typical car environment, r 1998 Elsevier 
Science P.V. All rights reserved. 

< usamrnenfassung 

In diesem Artikel wird ein Algorithmic fur die gemeinsame Reduktion von akustischen Echos unci von Storgeniu- 
schen vorgeschlagen. Der Algorithrnus wird auf der Basis des Kriteriums kleinster mittlerer quadratischer Fehler 
entwickclt und besteht a us einem Echokompensator und einem zusatzlichen Storgeriiusch- und Reslechoreduktionstilter. 
Einespezielle Eigenschaft des Algorithrnus besteht in der Schiitzungdes Leistungsdichtespektrums des Restechos, die auf 
der Annahme bcruht. da6 die Phase des geschiitzten Echos ungefahr der Phase des wahren Echos enispricht. Diese 
Annahine wird experimentell bestatigt. Der Schiitzwert fur die Restceholeistungsdichte und die Storgcrauschleistungs- 
dichte werden adaptiv kombiniert und a Is Argument fur eine spektrale Gewichtungsregel verwendet. so daB das Restecho 
abgeschwaeht wird und cffekiiv von dem verbleibendcn Reststorgeiausch maskiert wird. Der Artikel schlielit mit 
expcrimentellen Ergebnissen aus einer Kraflfahrzeugumgebung. ( I99S Elsevier Science B.V. AH rights reserved. 

Resume 

Nous proposons dans cet article un algorithmic pour le controle de I echo acoustique ct la reduction du bruit 
simultanes. Cet algorithme est developpe sur la base d'un critere derreur quadratiuue moyenne el consiste en un 
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annulateurcfecho (conventionneJ) et un filtre additionnel de reduction de bruit et d'echo residuel. Un trait special de cet 
algorithmeest la procedure d'estimation de la densite spectrale de puissance de Techo residuel qui repose sur Thypothese 
que la phase de Techo estime est approximativement cgale a la phase du vrai echo. Cette hypothese est confirmee par des 
resultats experimentaux. L'estimee de la densite spectrale de puissance de Techo residuel et Pestimee de la densite 
spectrale de puissance du bruit sont alors combinees de maniere adaptative et utilisees comme arguments d'un regie de 
ponderation spectrale de telle sorte que Fecho residuel est attenue et effect ivement masque par un niveau reduit de bruit 
de fond intentionnellemcnt conserve. Cet article se conclue par des resultats experimentaux pour un environnement 
automobile typique. 1998 Elsevier Science B.V. All rights reserved. 

Keywords: Acoustic echo control; Noise reduction; Wiener filter; Psychoacoustics 



1. Introduction 

The problem of combined acoustic echo cancel- 
lation and noise reduction has found considerable 
interest recently. This interest is fueled by applica- 
tions in mobile communications where both acous- 
tic echo cancellation and noise reduction are neces- 
sary to achieve sufficient quality of the transmitted 
speech signal. The realization of such a combined 
system is, however, a challenging task. The difficul- 
ties of acoustic echo cancellation are mainly due to 
the high computational complexity of the echo 
canceller and influences which disturb the adapta- 
tion of the canceller such as ambient noise, near end 
speech, and variations of the acoustic environment. 
In mobile applications where all these factors play 
a significant role, it is difficult to reach the echo 
attenuation as required by ITU and ETSI recom- 
mendations, see e.g. [1.2]. with an echo canceller 
alone. To achieve sufficient echo reduction, addi- 
tional voice controlled attenuators or a nonlinear 
processing device, e.g. a center clipper, can be in- 
serted into the signal paths which in turn limits the 
double talk capability of the hands-free system [3] 
or produces noticable nonlinear distortions. 

The noise reduction task is also not easily solved 
since in the typical reverberant environment no 
'noise only 1 reference signal can be obtained which 
is sufficiently correlated to the noise within the 
microphone signal. Besides this principal restric- 
tion, most automobile manufacturers and mobile 
communication equipment suppliers favour single 
microphone solutions. Thus, although a multi- 
microphone system might yield better noise reduc- 
tion, a single microphone spectral weighting ('spec- 
tral subtraction*) technique is often prefered. These 



methods, however, have well-known disadvantages 
such as limited performance at low SNR values and 
artificial sounding residual noise. 

In this paper we will present an algorithm for 
combined acoustic echo and noise reduction and 
summarize some of our research results. The algo- 
rithm is developed on the basis of a minimum 
mean-square error criterion. We show that acoustic 
echo control and noise reduction can be combined 
in a true synergy and that the combined approach 
will ease at least some of the above problems. The 
algorithm utilizes a conventional echo canceller of 
reduced order and an adaptive noise and residual 
echo reduction filter. We do not strive to achieve 
complete cancellation with the echo canceller alone 
but we rather use an echo canceller of reduced 
order to decrease the computational complexity 
and improve the overall robustness of the adapta- 
tion process. The required echo reduction is then 
achieved by further attenuating the residual echo 
with a combined noise and residual echo reduction 
filter in the sending path of the hands-free tele- 
phone. Similar to the well-known spectral subtrac- 
tion noise reduction technique [4,5] our approach 
requires an estimate of the power spectral densities 
of the ambient noise and the residual echo after 
echo compensation. The key issue of this paper is to 
show how an estimate of the power spectral density 
of the residual echo might be obtained. 

The echo cancellation and noise reduction 
problem has been addressed independently for 
many years (see e.g. [6-10] for reviews of these 
methods). In the last years it has been recognized, 
however, that the echo control and noise reduction 
problem can be tackled in a combined approach 
[11-19]. It has been shown that the combined 
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treatment yields algorithms which deliver better 
performance at less computational costs than sys- 
tems based on separate algorithms [14.17.18]. Pre- 
vious methods, however, do not feature the explicit 
residual echo estimation. 

The remainder of this paper is organized as fol- 
lows. In the next section we develop the basis of our 
approach, i.e. the minimum mean-square solution 
to the combined problem. Section 3 discusses our 
algorithm in detail with special emphasis on the 
estimation of the residual echo power spectral den- 
sity and the computation of the spectral weighting 
function. Finally. Section 4 presents and discusses 
our experimental results. 



2. An optimal solution to the combined problem 

Fig. I depicts the basic scenario for hands-free 
telephony. We assume that all signals are band- 
limited, digitized and that the microphone signal 
y(k) is a linear combination of the near end speech 
sik'u the near end ambient noise n(kl and the echo 
signal diki The echo signal d{k) typically consists of 
a component which is linearly related to the loud- 
speaker signal x(k) and a component which is the 
result of nonlinear distortions of the loudspeaker 
signal. In a stationary scenario the former can be, at 
least theoretically, identified and compensated by 
an echo canceller. In a practical implementation 
where the order of the canceller, the time variance 
of the acoustic environment, and the ambient noise 
have a significant influence, the linearly related 



near end acoustic noise 




Fig. I. Bask* hands- free telephony scenario. 



component cannot be completely eliminated by the 
canceller and will thus, together with the nonlinear 
component, contribute to the residual echo. 

To combine acoustic echo cancellation with re- 
sidual echo and noise reduction it must be asked in 
which order these processing operations should be 
performed. Although there arc good arguments in 
favour of processing first the noise reduction, our 
considerations and experimental results clearly 
show that the configuration where the echo com- 
pensation (EC) preceeds the echo and noise reduc- 
tion (EN R) is preferable. The main advantage of the 
EC/ENR configuration is that the noise reduction 
has not to cope with the disturbing echo signal as it 
is present in the microphone signal and that there is 
no time varying noise reduction filter in the echo 
path. Besides that, if the echo canceller does not 
deliver sufficient echo attenuation, the residual 
echo can be treated similar to the background noise 
signal and can be further attenuated by the noise 
reduction filter. This idea is successfully exploited 
in a frequency selective echo reduction technique, 
called *echo shaping [20,21], which does not re- 
quire complete cancellation of the echo by the echo 
canceller and is easily combined with a noise reduc- 
tion filter. A disadvantage of the EC/ENR config- 
uration is that the echo canceller has to process 
noisy signals. As a result, algorithms have been 
proposed where besides the noise reduction filter in 
the sending path, noise reduced signals are used to 
adapt the echo canceller [13,17]. 

For the derivation of the optimal solution in the 
minimum mean-square error sense, we assume that 
all signals are stationary and that the estimated 
signal .s(A*) at the output of the combined system is 
the result of linearly filtering the far end signal x(k) 
and the microphone signal yik). i.e. 

s{k) = yfA)*iv,(A) + Mk)*w z [k). ( I ) 

where \v x {k) and u : {k) are the impulse responses of 
two unconstrained (II R) or constrained (FIRl 
adaptive filters and * denotes the convolution 
operation. 

It is interesting to note that the very general 
approach of minimizing the mean-square error 
S\(s{k) - *(fc)r [•(<?! • \ denotes the expectation op- 
erator) suggests an algorithm for the combined 
system which first computes an echo compensated 
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Fig. 2. Block diagram of the acoustic echo cancellation and 
noise reduction system. 



signal e(k) and then performs a combined noise 
and residual echo reduction [18,22]. 

In fact, the unconstrained (11R) optimal solution 
is given by (see Appendix A) 



■(■ 



sffc) = y(k) - x(fc)*jF 



RJQ) 



,{Q) - R yx (Q)R; x \Q)RJQ)\ 



(2) 



where } denotes the inverse Fourier trans- 

form of discrete time signals and R xx (Ql R xy (Q), 
R yx (Q), R yy (Q)< RsslQ) denote the (cross-) power 
spectral densities of the signals in the subscripts. 
According to the above solution, the combined 
problem is fully separable into the optimal echo 
canceller with a frequency response C{Q) = 
R xy (Q)R~ x (Q) and a combined residual echo and 
noise reduction filter with frequency response 
H{Q) = W X (Q) = RjQ)lR yy \Q) - R yx [Q)R^ l (Q)R xy 
(Q)Y l . A block diagram of the resulting system is 
shown in Fig. 2. If the nonlinear distortions of the 
loudspeaker signal are neglected, . the echo signal 
d(k) is linearly related to the loudspeaker signal x{k) 
and the unconstrained optimal echo canceller will 
deliver a perfect estimate of the echo signal. In this 
case we find that R vy {Q) - Ry X (^)R:x(0)RJQ) = 

RssiQ) + R«n(Q) is valid and the echo and noise 
reduction poslfilter reduces to the well-known 
Wiener filter for a signal with additive noise. In 
general, and especially in the constrained FIR 
case, however, a residual echo component 
b(k) - d{k) - 3{k) will remain after the non-perfect 
echo compensation. 



Tn case that the echo signal d[k) cannot be com- 
pletely cancelled (because of a non-perfect canceller 
and/or additional non-linear components), the re- 
sidual echo and noise reduction filter H{Q) is given 
by 



HiQ) = 



RJQ) 



R„{Q) + RJQ) + RJQ) 



(3) 



where RJQ) denotes the power spectral density of 
the residual echo b(k). Similar to well-known noise 
reduction techniques, the estimation of the optimal 
filter H(Q) requires estimates of the noise power 
spectral density RJQ) and the residual echo power 
spectral density RJQ). It should be noted that in 
contrast to the ambient noise n(k\ the residual echo 
b{k) is a speech-like signal. Thus, the estimation 
procedures for R nn (Q) and RJQ) are entirely differ- 
ent. Furthermore, unlike the optimal uncon- 
strained solution for the above stationary 
scenario, any real adaptive implementation using 
a limited amount of data will be sensitive to 
ambient noise. Since there is no noise reduction 
before echo cancellation, the optimal solution as 
suggested by Eq. (2) requires a very robust 
echo canceller, especially in a car environment. If 
the residual echo power spectral density is known 
with sufficient accuracy, performance deficiencies 
of the canceller can be counterbalanced by the filter 
H(Q). 



3. An algorithm for combined echo and 
noise reduction 

In this section we present an algorithm which 
closely follows the minimum mean square ap- 
proach of the previous. section. Since the main focus 
of this section is on the adaptation of the optimal 
filter H(Ql we will not discuss the echo canceller in 
detail. The echo canceller in our combined system 
was developed by Antweiler [23] and utilizes an 
adaptive step-size control algorithm due to Frenzcl 
[24]. This canceller has been proven to be very 
robust even in noisy and time variant environ- 
ments. The canceller is described in detail in 
[23]. 
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3.1. SAT? estimation and weighting rules 

State-of-the-art noise reduction algorithms are 
based on a priori and a posteriori signal-to-noise 
ratio (SNR) estimates [5,25]. To facilitate the sub- 
sequent discussion of estimation procedures, we 
introduce a frame index im) and discrete frequencies 

Qi = 2ni:\L ie ;0. L2, 3 .V/ - 1 ) with AY being 

the FFT frame size. R^iQ^ I^Ul) and R^iQ t ) 
denote the power spectral densities - or short-term 
estimates thereof - of the signals in the subscripts 
for the mth frame. The a priori SNR for the com- 
bined residual echo and noise reduction problem is 
then aiven by 



JOG,) 



(4) 



which can be rewritten in terms of the individual 
SNR values related to ambient noise n[k) and the 
residual echo h{kl 



1 



[SNRf; ,m, (Q t )J" 1 + [SNRtno,-)]" 1 

(5) 

where SNRJ-^W,) and SNR^ ,m, (<} ( ) are given by 
and 



(6) 



SNRJ-nA) = 



(7) 



The optimal filter H{Q) in Eq. (3) is now easily 
expressed in terms of the a priori SNR. 



H lm \Q t ) = 



SNRjji^lQ/) 

snr^o )^ r 



(8) 



Similar to [5.18.19] the- individual a priori SNR 
values are estimated by a 'decision directed' ap- 
proach. The a priori SNR related to the ambient 
noise n{k) is given by 

SWWJ = (1 - 2 n )P(SNRS-nfl f ) - 1) 
|W (M - |, (f2 1 -)£ ;,H " n U2 i )|- 



where Pix) = \{\x\ + xl SNR^ ln "(i2/) denotes the 
a posteriori SNR with respect to the ambient noise 
n(kl 



SNRi J " ,, (0/) = 



|E lw '(flil| 2 
R^iO.) 



(10) 



(9) 



y. n is a step-size parameter, and E im \Qi) the discrete 
Fourier transform of the compensated signal elk). 
A similar expression holds for the SNR related to 
the residual echo b{k). 



3.2. Rower spectral density estimation 

For the combined reduction of residual echo and 
noise, separate estimations of the power spectral 
densities of the background noise nik) and the re- 
sidual echo bik) have to be performed, as the char- 
acteristics of the speech-like residual echo differs 
very much from that of the noise. The noise power 
spectral density /O^J can be estimated by the 
'minimum statistics' or 'spectral minima tracking' 
methods as outlined in [26.27]. These methods 
have the advantage that the noise power spectral 
density is estimated continuously, eliminating the 
need for a voice activity detector. They are also able 
to track slow variations of the noise power density, 
which is vital for the noise reduction algorithm to 
perform well if KlWi) is changing during speech 
activity. 

The estimation of the residual echo power spec- 
tral density RJS'fG,-) is much more involved since 
h{k) is a speech-like signal and thus stationary over 
lime periods of only 20- 500 ms. The main idea of 
our estimation procedure is. therefore to derive 
R\T[Qj\ from quantities which change less rapidly- 
over time. We achieve this by modelling the resid- 
ual echo as the output of a linear system with the 
echo dik) as its input and by assuming that the 
transfer function F im] (Qj) of this possibly non-causal 
and time vary ing system is statistically independent 
from the input signal dik). This assumption is, of 
course, not entirely fulfilled, because the echo can- 
cellation, which is modelled by F ,m, iG,\). is control- 
led by an adaptive algorithm which itself is depen- 
dent on the echo d\kl But over shor*. time intervals, 
when the room impulse response does not change 
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Fig. 3. Interpretation of the echo compensation as a transfer 
function P m {Qi). 
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Fig. 5. Time domain echo compensation: the magnitude of the 
echo, the magnitude error, and the phase error (in radians), for 
a sample speech frame. 



Fig. 4. Frequency domain vector diagram of the echo D im \QX 
the estimated echo D imi iQi) and the residual echo B tm \Qi). 

rapidly and the echo canceller works in a steady- 
state condition, one can assume F (m) (i3 i ) to be al- 
most constant and thus to be independent from 
d(k). 

The approach is illustrated in Fig. 3. [n terms of 
short-term frame oriented spectral analysis it leads 
to the identities 

BPXQi) = D^iQi) - D^iQil (11) 

B {m \Qi) = F^HQdD^iQi). (12). 

For our estimation procedure, we also assume 
(and verify by measurement) that the transfer func- 
tion F {m \Qi) can be approximated by a real valued 
function. In Fig. 4 the vectors D lmJ (0,), D (m, (fl;) and 
B tm \Qi) are plotted for a given frequency Q ( in the 
complex plane. The misalignment between D im \Qi) 
and D im \Qi) can be expressed in the magnitude 
error |D (m *(f3,)| - |D (m, (^,)| and the phase error 
q> = arg{D (n, (Of)} - arg{z3 (m> (fl £ )}. To justify the as- 
sumption, we note that - for some fixed phase error 
<p - the more the magnitude of the estimated echo 
deviates from the magnitude of the true 
echo |D* m, (f2,-)|, the smaller the phase deviation ip of 
vectors D {m) (Qi) and B im \Qi) is. Thus, in this case our 



assumption will give a good approximation, which 
will be even better the smaller cp is. It has been 
verified by simulations that due to the compensa- 
tion mechanism the phase error cp is indeed small at 
frequencies where the echo is present. A sample 
frame of \D {m \Qi)l of the magnitude error, and of 
the phase error are shown in Fig. 5. A large phase 
error can only . be found at frequencies where 
\D im \Qi)\ is very small. 

If the magnitude error between D {m \Qi) and 
D {m \Qi) is very small, the phase of B {m) (Q{) might be 
entirely different from the phase of D im \Qj). This, 
however, does not pose a serious problem since the 
residual echo is then small and we are only interest- 
ed in the magnitude of the residual echo. The mag- 
nitude of B <m, ((2j) will then be underestimated, 
which results in less echo reduction but not in 
additional distortions of the near end speech signal. 

From Eq. (12) we can write the power spectral 
density of the residual echo, R^\QX as a function 
of the power spectral density of the echo, R^HQil 

KM) = IF^WR&XQi) 

= (F m m) 2 RWi). (13) 

By combining Eqs. (11) and (12). both R l d T(Qi) and 
Rbb\&i) °an be written as functions of the transfer 
function F {m \Qi) and the power spectral density of 
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the estimated echo, RffiQi). 



(i - F (m, (Qi))' 



/ F trn, [P) V 



(14) 
(15) 



which are well defined for any F im) {Qi) ^ L which 
practically means that the echo canceller delivers 
a non-zero echo estimate. The problem of estima- 
ting R^iQi) is then converted into the estimation of 
the transfer function F {m \Qi). 

If neither near end speech nor near end noise is 
present, i.e. a noise-free single talk situation where 
y(k) =dik) and e(k) = b(k). F im1 {Q i ) can be cal- 
culated from Eq. (12), 



{F iml (Q i )) 2 = 



R l M) 



(16) 



However, as this situation seldom prevails, another 
solution must be found. 

Assuming statistical independence between the 
near end speech s{k). the noise n(k\ and the echo 
d(k) or the residual echo ft(fc), we can write the 
power spectral densities of the microphone signal 
yik) and the compensated signal e(k) as 

R^XQi) = R^(fl f ) + RMi) + RTAQil (17) 



R { M) = RM) + R'M) + /CfQ,-)- 



(18) 



Combining the above equations with Eqs. (14) and 
(15), we arrive at an expression for estimating. 
F* mt (flJ. which can now be calculated from measur- 
able quantities (see Appendix B). 



F im \Qi) = 



(19) 



Eq. (19) is only valid if RjSVfl;) : ^ 0. Under some 
circumstances, for example, when the estimated 
echo power spectral density is very weak compared 
to the power spectral density of the microphone 
signal. Eq. ( 19) can, owing to estimation errors and 
finite numerical accuracy, lead :o wrong results. 
Therefore, potential errors must be excluded from 
the calculation. In our algorithm this is achieved in 
four steps: 

I. Limit F tm, (£>,) to some reasonable range 



0.15 




-0 1 
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Fig. 6. A sample of the estimated transfer function F '""(fi,-). 



2. Split the frequency range in N < M subbands. 

3. In each subband, calculate the mean value 
FIT 1 of those FTXQi) where R^iQi) is not too 
small. 

4. At each frequency Q h set F""*(£>,) to the corres- 
ponding mean value Fi, ml . 

The transfer function F (m) {Qi) estimated this way 
will then be used for the estimation of R hb {Qi) using 
Eq. (15). It will possess a frequency-dependent 
step-shape as illustrated in Fig. 6, 



3 . 3. L im iting of estimated SNR 

With the residual echo power spectral density 
R^l\Qi) and the noise power spectral density 
R^iQ;) estimated as described in the previous sec- 
tion, we can now determine the SNR values and 
compute the spectral weighting coefficients. An im- 
provement of the auditive impression can be ac- 
complished if the SNR estimates of the residual 
echo and the noise are limited and balanced with 
respect to each other. Also, it is often desirable to 
leave a low level of natural sounding residual noise 
in the processed signal. This can be achieved by 
limiting the estimated a priori SNR to a minimum 
threshold 7" , 



SNRr : ' ,,: 'U>,) = maxiSNR^nAh'T.K 
SNRi-'-YQil = maxtSNR^fl,), T n ). 



(20) 



2S 
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Especially at SNR values below T n the limiting will 
have a significant effect. Thus, the stronger the 
background noise level is, the higher the noise level 
in the processed signal will be. This is in fact an 
advantage, as speech enhancement in general per- 
forms less well in a low SNR-environment. The 
limiting prevents too high an attenuation and 
therefore it also reduces the distortions of the near 
end speech, which otherwise might lose intelligibil- 
ity. A proper range is T„ = 0.01-0. 1, where the 
chosen value eventually depends on criteria such as 
desired noise reduction and admissible speech dis- 
tortion. If the threshold T„ is chosen too high, the 
amount of noise reduction will be very low: if it is 
chosen too low, the limiting will have almost no 
influence. 

Now consider an equivalent limiting of the SNR 
refering to the residual echo using a constant thre- 
shold T b . This will have the effect that some resid- 
ual echo will always be left in the signal sik). Of 
course, this is not desirable in a noise-free situation, 
as the echo might then be audible. However, when 
noise is present, some limiting, is necessary as 
otherwise the attenuation by the filter H might be 
too high, leading to disturbing modulations of 
the residual noise whenever the far end speaker is 
active. 

We therefore propose to attenuate the residual 
echo b[k) where it is most likely not masked by the 
residual noise. In the processed signal s(k) only the 
near end speech and an attenuated, natural sound- 
ing background noise should be audible, but no 
echo. This can be achieved by a frequency depen- 
dent limiting with a threshold TT'tflf), 

SKR^lQi) = max(SNRj;- ,m, (Q l ), TjHG.R 
SNR^nGi) = maxlSNRf/n.Q.i 7T(G,-H. 

where T™\Qi) is a function of the chosen threshold 
T„ and of the power spectral densities of the resid- 
ual echo and the noise, for example 

T {t,, HQ \ = " n P2) 

With this limiting function, V^(Qi)-+0 if 
R , n n w ' l (f2 t )->0. thus permitting complete attenuation 
of the residual echo. When there is a strong noise 



present, which already masks the residual echo, 
/Ofl;)»/Ctf2i) leads to n m \Qi)*2T m effec- 
tively preventing too high an attenuation. Finally, if 
/OA/) =-RWt) thenTP(A) = T r and the SNRs 
are all limited to the same level. In an idealized 
stationary condition this would lead to a combined 
SNR of exactly half the value of the individual 
SNRs (see Eq. (5)). 

With the above adaptive limiting, and the sub- 
sequent combination of the different a priori and 
a posteriori SNRs, the speech enhancement algo- 
rithm will work well over a wide range of input 
signal-to-noise conditions, effectively reducing the 
background noise and the residual echo with only- 
minor impacts on the near end speech quality; 



4. Experimental results 

Our algorithm was evaluated in a car environ- 
ment with single talk, double talk, and various 
ambient noise levels. The sample frequency was 
8000 Hz. In the car. which had a reverberation time 
of about 70ms, the combined system with an echo 
canceller of only N e = 200 filter taps gives satisfac- 
tory performance. The residual echo and noise re- 
duction filter H(Qi) was realized in the frequency- 
domain by means of a framewise processing with 
a 512 point FFT with 50% overlap. The frames 
consisted of 256 data samples multiplied by a Ham- 
ming window and were zero padded to the full FFT 
length. • 

All experimental conditions were evaluated using 
informal listening tests and the instrumental assess- 
ment method as described below. 



4. 1. Instrumental assessment 

Our evaluation method is based on a separate 
processing of the acoustic echo and the near end 
signal [28.29]. This evaluation scheme requires, 
however, that the near end speech and noise signals 
are recorded independently of the echo signal. The 
simulation setup is shown in Fig. 7. 

Based on the signals shown in Fig. 7 the follow- 
ing measures can be defined: 



29 



s(k) 



H 



(±> 



c(k) 



H 



adaptation 
of filter H 



d(k) 



d(k) 



C 



H 



b(k) 



xik) 



Fin. 7 . Siufial r.iodcl tor the inMriuyicnuil evaluation of the combine! iiliiorilhm. 



the time average of the echo return loss enhance- 
ment of the compensator C. 

ERLE C - = — V Jog,,, -— 3— T . 

{23} 

where k y h- 1 is the index of the first sample and 
is the index of the last sample, of the 
measurement: 

the time average of the echo attenuation of the 
combined system icompensaior C -f- filter Hi 

io v /^!</^>:\ 

(24) 



ERLEr// = 



where is the residual echo h{k) tillered with 
the filter H. 

the distortion of the near end signal caused by 
the filter H as measured by the segmental SNR. 

I K ' x 

SEGSNR = — V max(SNRJ-Jm).0». 



with 

SNR-_,UJi) 

= = 10 10 



gM \lr^- ! [sii)-s { i-\* H ))\r 

[2b) 

where .V H is the delay caused by the filter 77. ;V W 
the .segment length. K is the total number of 
segments, and Kssk -u is the number of frames 
with SNRJ.. V > 0; 
• the noise reduction NR of the combined residual 
echo and noise reduction filter H. 



NR - 



10 



^[fiHk)\\ {21) 



(25) 



The expected values ''*;*! are computed as 
ensemble averages acros?* Jv 16 phonetically bal- 
anced sentence^ Since H{Q) is a linear phase filter 
the SEGSNR criterion measures only the ampli- 
tude distortions of the near end signal. 
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4.2. Single talk experiments 

The single talk situation was evaluated using 16 
phonetically balanced sentences and additional car 
noise. Three different echo-to-noise ratios were 
considered, SNR^ = 0,10 and 25 dB. In. Fig. 8, the 
mean echo return loss enhancement (ERLE) for the 
compensator (ERLE C ), for the combined system 
(ERLE C //), and the noise reduction is plotted as 
a function of the echo-to-noise ratio at the micro- 
phone input. As expected, less noise in the micro- 
phone input results in a better echo attenuation. In 
the noiseless case, the overall echo attenuation is 
about 50dB which is sufficient to fulfill ITU and 
ETSl recommendations. As the level of noise in- 
creases, less echo attenuation is necessary and de- 
sirable, since some of the residual echo is masked 
by the noise and too high an attenuation would 
result in a fluctuating residual noise signal with 
a negative impact on the perceived quality. Interest- 
ingly, the echo canceller still performs well at SNR„ 
= OdB. This is because of its relatively short length, 
which makes it more robust in noisy environments. 
The good performance is also a condition for the 
residual echo reduction to work satisfactorily. 

4.3. Double talk experiments 

In our double talk experiments eight phoneti- 
cally balanced sentences were used both for the 
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Fig. 8. Simulation results for single talk situations. ERLE C : 
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C and filter H: NR: noise reduction. 
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Fig., 9. Simulation results for double talk situations. ERLE C : 
ERLE with compensator C; ERLE CH : ERLE with compensator 
C and filter H\ NR: noise reduction; SEGSNR: segmental SNR 
for near end speech. 



near end speech and the far end speech. 
Fig. 9 shows the ERLE of the compensator 
(ERLE C ), the ERLE of the combined system 
(ERLEc/iX the noise reduction NR ? and the seg- 
mental SNR of the processed near end speech 
(SEGSNR) as a function of the echo-to-noise ratio 
SNR*. The near end speech was of about the same 
power as the far end speech. Compared to the single 
talk case the overall echo reduction is now about 
20 dB lower. Because of the double talk the com- 
pensator now converges much slower. Again, since 
the near end signals will mask some of the residual 
echo, a higher overall echo attenuation is not desir- 
able, since it would only lead to more distortions of 
the near end speech signal 

Informal listening experiments confirm that the 
remaining echo is indeed almost unbearable. Of 
course, double talk results in light distortions of the 
near end speech signal, but since the far end speaker 
is active at the same time these will have only 
a minor effect on the perceived quality. 



5. Conclusions 

In this paper we have shown by theory how the 
tasks of acoustic echo attenuation and noise reduc- 
tion can be combined. We propose a structure 
consisting of an acoustic echo canceller which is 
followed by an adaptive postfilter. The postfilter is 
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able to attenuate not only the background noise, 
but also the residual echo left by the echo canceller. 
Thus, the length of the echo canceller can be reduced. 

In order to be applicable in the considered 
hands-free telephony environment, we demand 
from the system to perform a significant acoustic 
echo and noise reduction for a wide ranee of sig- 
nal-to-noise conditions. Furthermore, a high near 
end speech quality and a natural sounding residual 
background noise are of great importance. 

To achieve this the power spectral densities of 
the residual echo and the background noise, which 
have inherently different characteristics, are esti- 
mated separately. A further new feature of our 
algorithm is an adaptive combination of the separ- 
ate estimates, such that a low level of background 
noise will remain. It has been found that by careful- 
ly balancing the residual echo and noise attenu- 
ation the psychoacoustic masking effects of the ear 
contribute to a significant improvement of the per- 
ceived quality of the processed near end signal. 

The complexity of the algorithm depends mainly 
on the order N c of the time domain echo canceller 
C. For N c = 200. the echo canceller and the postfil- 
ter H will need approximately the same number of 
operations per input sample. The total number of 
operations is then estimated to be less than 20 
MIPS using a typical DSP. 

The extra delay introduced by the filter H de- 
pends on the implementation of the analysis/syn- 
thesis structure, especially the FFT-tength and the 
zero padding, and is in our system about 256 samples. 
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Appendix A. Derivation of the optima! structure for 
a combined noise and residual echo reduction filter 

Differentiation the expectation 6 \{s(k) - s(k)) 2 ) 
with respect to the unknown filter coefficients. 



where 

s(k) = ylk^xv^k) + x{k)*\v 2 (kl 
leads to the normal equations 



(A.l) 



Tvi ,(f") 



= - r ys (i) -r vi , (r)*r rr (/) + w 2 (0*r rA .i/) 
= 0, 



0S{e 2 [k)) 



= 0 V/, 



(A.2) 



where r xy {i) = £{x(k)y(k + ?)} denotes the correla- 
tion function of the signals in the subscripts. 
Eq. (A.2) yields in the frequency domain 



W { (Q)R yy (Q) + W z (Q)R yx {Q) = RJQl 
\V X (Q)RJQ) + W 2 (Q)R XX \Q) = R XS (Q) = 0. 
which readily gives the desired result 



(A.3) 



R yy {Q) - R- x l {Q)R x> {Q)Ry X (Q) 



= //(G), 



(A.4) 



Appendix B. Derivation of Eq. (19) 

Recall Eqs. (17) and (18) 
R%HQ t ) = OQ,-) + *™ <°i> + K£1G<). 
/CIA.-) = KM-) + JOQd + 

Subtracting Eq. ( IS) from Eq. (17) and substituting 
with Eqs. (1.4) and (15) yields 

= RS'(Gi) - RWi) 

i-ir-'ffj,))- , 

■=> [I - f ,m \Qi)] - RS\a,)] 

= [I +F"" l (« ; )]R7 J "(fi,). 
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This leads directly to the result E:q. (19), 

fimt{Q = *g'(0,-) - OA) - Kim) 

1 R\f(Q { ) - R { ™\Qj) + 2?H'(A) 

Once again note that F tm> (f2 f ) ^ I means that there 
is some estimation of the echo at the frequency Q h 
i.e. RffiQi) * 0 and R$\Qi) * R^AH). Therefore. 
Eq. (19) is only valid for RffiQi) * 0. 
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